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ENTRIES 
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Abstract. We consider nonnegative integer matrices with specified row and col- 
umn sums and upper bounds on the entries. We show that the logarithm of the 
number of such matrices is approximated by a concave function of the row and col- 
umn sums. We give efficiently computable estimators for this function, including 
one suggested by a maximum-entropy random model; we show that these estima- 
tors are asymptotically exact as the dimension of the matrices goes to oo. We 
finish by showing that, for k > 2 and for sufficiently small row and column sums, 
the number of matrices with these row and column sums and with entries < k is 
greater by an exponential factor than predicted by a heuristic of independence. 



1. Main objects and results 

A contingency table is defined as a nonnegative integer matrix with specified row 
and column sums. Specifically, given 

R= iri,r2,. . . ,rm) eZ'^Q and C = (ci, Cs, . . . , c„) G Z% 

such that 

ri+r2-\ ^rm = Ci + C2-\ h c„ = A^, 

we denote by Il{R, C) the set of all A = [aij] G M>(f " such that 

n m 

ttij = Ti {1 < i < m) and aij = cj {I < j < n). 
j=i i=i 

Then Il{R, C) is a convex polytope, and its integer points are known as the con- 
tingency tables with margins R and C; we denote the number of such tables by 
T{R,C). 

Given a matrix K = (hj) G (Z>o U {oo})'"^", define 

Uk{R, C) := {A G U{R, C) : aij < Vi,i}. 

Thus the integer points of IIk{R, C) are the contingency tables with margins R and 
C, bounded entrywise by K. We denote the number of such tables by Tk{R, C). 
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We will be particularly concerned with the case in which K is constant, with all 
entries equal to k G Z>o U {oo}, in which case we abuse the notation slightly by 
writing Iit^{R^C) in lieu of YiK{,R,C)] Ti^{R,C) in lieu of Tk{R,C); and so on. 
The integer points of IIk{R, C) are contingency tables with margins R and C and 
entries restricted to {0,1,2,...,/?}. When k = 1, these tables are known as 0-1 
tables. When k = cxd, it is to be understood that {0, 1, 2, . . . , k} signifies Z>o, so 
that noo(i?,C) := Ii{R,C). 

Contingency tables arise in statistics, where they are used to represent the joint 
distribution of two categorical variables in a sample. In this context, one may wish 
to test how significantly an observed table A deviates from a "typical" table with 
the observed margins. A plausible candidate for the "typical" table is the rank 1 
table, 



N 

Deviation from this table tests a hypothesis of independence between the two cate- 
gorical variables. 

In a similar spirit. Good [12] proposed the following heuristic for estimating T{R, C): 
Consider the set of m x n nonnegative integer matrices with sum of entries equal to 
N; there are (^^^^i"^) such matrices. Equip this set with the uniform probability 
measure. Then the probability that a random sample from this set has row margin 
R is 

N + ran — 1\ ^ /r j + n — 1 



n 



mn — 1 I \ n — 1 

while the probability that a random sample has column margin C is 

+ mn — -A- I' Cj + m — 

mn — 1 J \ m — I 

j=i ^ 

If these two events were independent, then the number of tables satisfying both 
constraints would be 

In [8], see also [9], Diaconis and Efron criticized the choice of A™'^ as typical, and 
discussed alternatives to the independence hypothesis, including a hypothesis that 
A is drawn uniformly at random from the set of tables with the observed margins. 
This creates the need for a different notion of the "typical" table, which, it turns 
out, may diverge dramatically from the rank 1 table (see [5]). 

Using the theory of permanents and matrix scaling, Barvinok estimated T{R, C) [2] 
and Ti{R,C) 0] and showed that T{R,C) is approximately log-concave pj. Barvi- 
nok also showed [3] that, in a certain asymptotic sense (see Definition [21 "Cloning"), 
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T{R, C) is almost always severely underestimated by /(.R, C). That is, for most val- 
ues of R and C, when m and n are large, the conditions 

n m 

aij = Ti [1 < i < m) and aij = Cj (1 < i < n) 

j=i i=i 

are strongly positively correlated. Yet if we define 

by analogy to I{R,C), then it turns out that Ii{R,C) typically overestimates 
Ti{R,C)] that is, for 0-1 tables, the conditions 

n m 

aij = Ti [1 <i <m) and aij = Cj {1 < j < n) 
j=i i=i 

are strongly negatively correlated [1] for most values of R and C. 

The aim of this paper is to extend these results to the case of general k, and in 
particular to examine the transition between the positive and negative correlations 
described above. Although asymptotic positive correlation does not occur for any 
margins {R,C) when k = 1, we will show that it does occur for some margins 
whenever k >2. 

1.1. How to read this paper. The four principal theorems of the paper are stated 
in sections 11.21 and 11.51 Each is proven in its own section (12. H \2A\ \3.2\ 13.41) . The 
intervening sections are thematic, each introducing a concept which will be used in 
the statements and proofs of the four theorems. These sections contain definitions, 
lemmas, and proofs of lemmas, as well as informal motivating remarks; they can be 
read linearly as a "story," or merely scanned for their formal content. The reader in 
a hurry is advised to browse these sections for their definitions, then proceed directly 
to the proof sections, referring back to the thematic sections as needed (e.g., for the 
statement of lemmas cited in the proofs). 

1.2. Results: part 1. We begin by adapting a "Brunn-Minkowski-type" result of 
Barvinok [1] to prove approximate log-concavity for Tk{R,C). Using the following 
definitions: 

Definition 1. For a vector or matrix V, let \ V\ denote the sum of the entries ofV. 
For an integer n > 0, let uj{n) := ^ (agreeing that O" = 1/ For a vector or matrix 

V with nonnegative integer entries, let Q{V) denote the sum of u){v) over all entries 

V ofV. 



we can state: 
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Theorem 1. Let ai + a2 + ■ ■ ■ + Qp = I (ai, . . . , > 0). Let R^, . . . , i?^ G Z' 



\cp\ 



and C\...,CP e Z^q, such that \R^\ = ■ ■ ■ 
K e Z^o^". Define 

R ■= aiR'^ + a2R^ H h apR^ and C := aiC^ + agC^ H h apC. 

Also, define vectors C\C'^, . . . ,Cp,C e by 



m 
>0 



A^. Let 



and 
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i=l 



- Ci 



aic] H h QpC^) 



i=l 



(The coordinates of vectors with uppercase names are indicated by lowercase letters: 
e.g.,& = {c\,...,ci)-) 



Then 



u{\K\)Tk{R, C) 
Q{R)Q{C)Q{K) 



> 



Tk{R\C') 



at 



This theorem is somewhat opaque in itself, due to the confounding factors Q{R^), 
f2(C*), etc. However, some analysis reveals that these factors typically grow more 
slowly than the numbers T^- For a precise statement, we follow [3] and introduce 
the following asymptotic regime. 

Definition 2 (Cloning). Let 



= (ri, . . . ,rm) G Z>o and C = (ci, . . . , c„) G Z' 



>o- 



Then we define 



and 



= {sri,...,srm, sri,...,sr„ 



sri,...,srm) 



ySCi, . . . , SCfij SCi, ■ ■ ■ 1 SCj^, . . . , SCx, . . . , SCn), 

where the number of repetitions is s (thus R^^^ G Z^ and C^'^^ G Z?^q). We refer to 
these vectors as the s-fold clonings of R and C. 

If K ^ Z>Q then we define K^^'^ as the sm x sn matrix of form 

(K K ■■■ K\ 
K K ■■■ K 



\K K ■■■ K) 

(with s blocks in either direction). We call this the s-fold cloning of K. 
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Note that the clonings are defined so that, if A is a contingency table with margins 
R and C, then the sm x sn matrix 

/A A ■■■ A\ 

A A ■■■ A 

\a a ■■■ a) 

has margins and C^'\ 
Now we can state 

Theorem 2. Let R G IJ^q, C G Z^q, and K G Z^q''". Assume that Tk{R, C) > 0, 
that is, there is at least one contingency table with margins R and C , hounded 
entrywise by K. Then 

lim 4lnT^w(^^'\C'(^^) = In f inf 

s^oo s \xt,...,xm,y\,--;yn>Q x^y*^ / 

where 

m n 

G(x, y) := n n + ""^y^ + (^^^^■)' + ■ ■ ■ + (^*%-)'^"'] 
i=i j=i 

(and x-^, denote x^^Xg^ ■ ■ -xj^, y'iy'2 ' ' 'Un")- 

To turn this into a more usable estimate, we employ some concepts from probability 
theory. 

1.3. Maximum-entropy models. How might we "approximate" the uniform dis- 
tribution on n = ]1k{R,C)7 One naive approach is to construct a random matrix 
with entries bounded by K which satisfies the row and column constraints on aver- 
age, that is, in expectation. Among all possible distributions for this random model, 
one compelling choice is that which achieves the maximum entropy. Such a distri- 
bution necessarily assigns equal mass to all bona fide integer points of 11, while also 
awarding some mass to impostors outside H. Thus the entropy of the random model 
overestimates the entropy of the uniform distribution on 11 fl Z™^", and so provides 
an upper bound on lnTft'(_R, C). Theorem [3] expresses that this upper bound is ac- 
tually a tolerably good estimate. The great advantage of the random model is that 
it is readily computable, with all coordinates being mutually independent. 

The reader interested in the general applicability of this method to integer points 
of polytopes is directed to [6], [16]. The underlying "maximum-entropy principle" 
has a long history (see for example [Tli|, [15j, [13j), and we do not address its overall 
merits here; it is not a formal prerequisite for any of the results to follow or for their 
proofs, but it does lend them intuitive plausibility. 

The following definitions are prerequisites for a statement of the results. 
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Definition 3. A random variable X is truncated geometric with support 
{0, 1,2, ... ,k} if there are parameters p G (0, 1] and q G [0, oo), such that 

Pr[X = t]=pq* for t = 0,1,..., K. 

For symmetry, we also say that X is truncated geometric with parameters p = and 
q = oo if Pr[X = k] = 1; however, in what follows, explicit treatment of this case 
will sometimes be left to the reader. 

When K, = oo, we have already indicated that {0, 1,2, ... ,k} is to be interpreted as 
^>o = {0,1,2,...}. A random variable on this support is geometric if there are 
parameters p G (0, 1] and q G [0, 1) (in this case necessarily satisfying p + q = 1), 
such that 

Pr[X = t]= pq* for t = 0,1, 2,.... 

To avoid unnecessary duplication of results, we regard this as a special case of the 
truncated geometric distribution. 

Given k G Z>o and x G [0,k], or k = oo and x G [0,oo), there is a unique trun- 
cated geometric distribution with support {0, 1,2, . . . , k,} and expected value equal 
to X. 

Definition 4. We denote this distribution by TG{x;k), and its parameters (as in 
Definition\B\) by p{x; k) and q{ 

The parameters p = p{x; k) and q = q{x; k) are given implicitly by the equa- 
tions 

1 =p(l + g + g2 + --- + g"), (1) 

x = p{q + 2q^ + --- + Kq''), (2) 

which, to the author's knowledge, cannot be neatly solved in general (but see sec- 
tion [3] for a discussion of the simplest 1 and K = oo). 

Recall that the entropy of a discrete random variable X (here normalized to base 
e) is given by 

H[X] := - J2 Pr[X = x]lnPr[X = x]. 

2:€supp(X) 

Among all probability distributions with support in {0, 1,2, . . . , k} and given expec- 
tation X, the greatest entropy is achieved by TG{x; k), as is well-known (and may 
be readily proved by the method of Lagrange multipliers). Hence 

Definition 5. Given k G Z>o and x G [0,k\, or k = oo and x G [0, oo), let H^^^{x) 
denote the entropy of TG{x; k). 

We regard H^^^ as a function on [0, k]. We break off a discussion of its formula and 
basic properties into section [31 having said all that is needed about in order 

to state the paper's remaining main results. 
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1.4. The independence heuristic for Ilf^{R,C). Inspired by Good's estimate 
I{R, C), we propose and consider an estimate /^(-R, C) for the number of contingency 
tables in IIk,{R,C) when 1 < k < oo. (Technically, k = is allowable in all that 
follows, though the consequences are trivial.) 

Definition 6 ("(/€+ l)-nomial coefficients" pjj). Let k be a positive integer. For 
integers n > and < r < hk, we denote by (")^ the coefficient of in the 
polynomial expansion o/ (1 + x + + ■ ■ ■ + x'^)". 

For integers n > r > we define (")^ to be the coefficient of x^' in the power 
series expansion of {1 + x + x"^ + ■■■)"' . 

Note that = □ and (;:)^ = = For k ^ 1, oo, there is (to the 

author's knowledge) no comparably neat exact formula for (")^. 

Consider the uniform probability measure on the set of m x n matrices with entries 
in {0,1,2, k} and sum of entries equal to A^. The number of such matrices 
is {"^) ^- The probability that a random sample from this set has row margin R 
is 

/ mn\ ^ T-i- / n\ 

fjw; 

the probability that a random sample has column margin C is 

/ mn\ ^ -A- / m\ 

\n) -1-1 LJ ■ 

\ / K j = l ^ ' K 

Thus if these two events were independent, then the number of tables satisfying 
both constraints would be given by 

Definition 7. 



UR, C): = 




Just as T^{R,C) specializes to T{R,C) when k = oo, note that Ioo{R,C) = 
I{R,C). 

1.5. Results: part 2. We show the following "log- asymptotic" formulas for Tk(-R, C) 
and h{R, C): 

Theorem 3. Let R e Z^g, C e Z^q, and k G Z>o U {oo}. Assume \R\ = \C\ = N, 
and assume that T^^R, C) > 0. Recall the definition of the cloned margins R^^\ C-*^ 
(Definition\^. Then: 

(i) 

^ m n 

lim — lnTJi?(^),C7(^)) = max V V i/'"'^"(z„). 

1=1 7 = 1 
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(ii) 

lim i-ln4(i?(^),C(^)) = 



s— s>oo 



m 

We pause to note that the right-hand side of Theorem [3](i) is efficiently computable, 
as it is the maximum of a strictly concave function over a convex polytope; see 
Lemma El This expression represents the entropy of the "random model" described 
at the beginning of section 11.31 

Theorem |3] equips us to prove the following positive correlation result: 

Theorem 4. Continue the assumptions of Theorem\^ further, suppose k >2. Then 
there exists S = 6{k) G (0, 1), such that if {R, C) satisfy 

max Tj I I max c, | < 5kN 

l<i<m J \l<j<n ■' ) 

then 

lim 4lnr,(i?(^),C(^)) > lim 1 In C^), 

s— >-oo S s— >-oo s 

with strict inequality if neither R nor C is a constant vector (i.e., if it is not the 
case that ri = ■ ■ ■ = r^ or ci = ■ ■ ■ = Cn). 

We do not present a corresponding negative correlation result. Some commentary 
on the prospects for such a result can be found in section 13.51 



2. Approximate log-concavity for Tk{R,C) and its consequences 

The following "Brunn-Minkowski-type inequality" is proven in [1]: 

Theorem 5 (Barvinok). Let ai+a2-\ hftp = 1 (ai, • • • , ftp > 0). Let R^, . . . ,R^ € 

Z^o and C^,...,C'p & Z^q, such that \R^\ = ■■■ = \Rp\ = \C^\ = ... = \Cp\ = N. 
Let W e Z^o^" . Define 

R := aii?^ + a2R^ H h apR^ and C := aiC^ + a2C^ H h apC^ 

and 

m n 

T{R,c-w):= nn<- 

AeZ'"X"nn(_R,c) i=i j=i 

Then 

T{R\C'-W) 1 
min{fi(/?*),fi(C*)}_ ■ 



uj{N)T{R,C;W) ^ fr 
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(See Definition [1] for the meaning of tlie functions Vt.) 

The matrix W may be thought of as attaching weights to the positions of an m x n 
matrix. If is a 0-1 matrix, then T{R,C;W) counts contingency tables with 
enforced zeroes in the positions such that Wij = 0. We easily prove Theorem [T] 
by recasting the margin and iT-boundedness conditions as a regime of enforced 
zeroes (in a larger matrix). 

2.1. Proof of Theorem [1]. Assume the hypotheses of Theorem [TJ Define vectors 
7^^ 7^^ . . . , 7^^ 7^ G Z^o+" and C G Z^o" by 

-TDt ^ ( t t ~t ~t \ 

7?. (^1; • • • ) frm , . . . , Cn), 

C = (A^ii, . . . , kirii ^21) • • • ; ^2n) • • • ; ^mlj • • • ; ^rran)- 

Observe that 

[R}\ = |7^2| = ... = |7^p| = Id = \K\ 

and that 

7^ = aiU^ + a2V? + ■■■ + a^W . 

Define W = (to.,.) as the (m + n) x (mn) matrix with 

Wi^(^i^i)n+j = 1 for all z = 1, . . . , m and j = 1, . . . ,n, 

u^m+j- (i_i)„+j = 1 for alH = 1, . . . , m and j = 1, . . . , n, 

and zeroes in all other positions. 

Given a contingency table A = (ajj) G ni^(i?, C), we may construct a table A = 
(a' .) G n(7?., C) by assigning 

0''iXi-i)n+j = '^jj for alH = 1, . . . , m and j = 1, . . . ,n, 

a^+j- (i_i)„+j = hj -aij for alH = 1, . . . , m and j = 1, . . . , n, 

and zeroes in all other positions. This conversion is easily reversed, and thus gives a 
bijection between tables A G IIk{R, C) and tables A G 11(7^, C) which have enforced 
zeroes in all zero positions of W. That is, 

Tk{R,C) = T{n,C;W). (3) 

Similarly, 

Tk{R\C') = T{n\C;W) (4) 

ioT t = 1, . . . ,p. 

Substituting TZ^, . . . , TU" , 71 for R^, ... , R^, R in the statement of Theorem 5, as well 
as C for , . . . , , C and \K\ for N, we obtain the conclusion 

T{n\C;W) 
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Using equations ([3]) and (jlj), and recalling the definitions of 7^, 7^*, C, we rewrite the 
above result as 



uj{\K\)Tk{R,C) ^ -|4 

n{R)n{c)n{K) ~ fj^ 



.min{(](i?*)l](C*), n{K)]_ 
proving Theorem [H ■ 

Remarks. The reader may notice that Theorem[T]can be stated in greater generality 
with only trivial modifications to the proof. For instance, Tx{R, C) can be replaced 
by a weighted function Tk{R, C; W), analogous to the function T{R, C; W) in the 
statement of Theorem [U also, given 

K = aiK^ + a2K'^ + ■■■ + UpK^, 

Theorem [1] remains true when each instance of K on the right-hand side is replaced 
by K\ 

Theorem 5, here taken for granted, was originally derived from estimates for 
T{R,C]W) obtained by Barvinok [2] via the theory of permanents and matrix 
scaling. Those estimates can be converted to the i^'-bounded setting in the manner 
illustrated above; however, we are able to bypass this step in the theory, as the more 
refined instrument of Theorem 5 is directly adaptable to our needs. 



2.2. An honestly concave proxy for \nTpc{R, C). We now begin to assemble the 
ingredients for the proof of Theorem [2l Throughout this section, assume K G 'E^^'^ . 
We define a function which "smooths over" lnrx(-R, C): 

Definition 8. For R G W^^, C G and K G Zl^g^", let 

p 

f{R,C) = fK{R,C) := max V lnT^(i?*, C*). 

ai,...,Qp>0 ' 
aiH hap=l *=1 

(To be clear, the maximum is taken over choices of p > 1, ai, . . . , ap, R^, . . . , R^, 
and C^, . . . , which satisfy the indicated constraints, and for which the summation 
on the right is defined. If the maximum is taken over an empty set, then we regard 
it as —oo.) 

Note that the maximum in Definition [S] is well-defined (allowing for the — oo case), 
because there are finitely many pairs {R, C) for which Tk{R, C) > 0. It is redundant 
to allow any repetition among R^, . . . ,Rp or C^, . . . , C^, so the summation on the 
right takes on finitely many values. 

Lemma 1 (Properties of f{R,C)). 

(i) fiR,C)>\nTKiR,C). 

(ii) / is concave. 
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The proof of this lemma is straightforward, but the notation is cumbersome, so it 
is deferred to an appendix. 

Lemma 2 (Quahty of approximation). Suppose R G Z^q, C G Z"q, and K G Z!!^q^". 
Define (7 = (ci, . . . , c„) by Cj := ^ij) ~ ^j^ '^'^^ suppose that C G Z"q. Then 

m n 

fK{R,C) -lnTK{R,C) < -hiv/27r|K| + ^lnv/2^+^lnv/2^ 

i=i j=i 

+ (m + n) In 




Proof. By Stirhng's formula. 



n 



\nV2 Tcn — In 



< lna;(n) < n — lnv2 



Tcn 



(5) 



for n > 1. 



Choose ttp, R^, . . . , R^, C^, . . . , which achieve the maximum in Defini- 

tion [HI Now apply Theorem [T] and 



fK{R,C)-\nTK{R,C) < In 
co{\K\) 



ui\K\) 



n{R)n{c)n{K) fJi 



< In 
= In 



n{R)n{c)n{K) 

u{\K\) 



lUiCi 



n{R)n{c) 

m Ti 

= lna;(|K|)-^lna;(rO -^ln( 

i=i j=i 

< |K| - In ^/2tc\K\ - ^{ri- In V^vrn - In 

i 

m n X \ 

< - In ^/2n\K\ + ^ In V^tttI + In ^/2ncj + (m + n) In ( — ^ ) . □ 

i=i 7=1 \V2nJ 



e 

-2^ 
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2.3. Exact and approximate generating functions for tables. Theorem [2] 
refers to the following polynomial: 

m n 

^(x, y) := n n + ""^y^ + ^^^^^'^^ + ■ ■ ■ + 

i=l j=l 

(x = (xi, . . . , y = {yi, yn)). 

Lemma ^. G is a generating function for K-hounded contingency tables; that is, 

G'(x,y) = 5^^r^(i?,C)xV, (6) 

R C 

where the sum is taken over all possible margins R, C (of lengths m and n). 



Proof. Trivial. □ 

In principle, we can "compute" Tk{R,C) by expanding (^(x, y) and extracting the 
coefficient of x-^y*^. This is of course not practical, but we might estimate this 
coefficient by 

. r g(x,y) 

mf — ^— ^; 
xi,yj>0 x^y^ 

indeed, this is an w^j^^er bound on Tx{R, C), as may be readily seen by dividing both 
sides of dn]) by x-^y*". To bound Tk{R, C) from the other side, we replace ^(x, y) 
by an approximate version with smoother coefficients: 

Definition 9. Let 

G(x,y) := 5^5^e/(^-^)xV, 

R C 

where the sum is taken over all integer margins {R, C) such that f{R, C) > —oo. 
(See Definition [8] for the meaning of f{R, C).) 

We will find the following lemma useful, as it will allow us to pick out any nonzero 
term of ^(x, y) as the largest: 

Lemma 4. For any (i?*,C*) in the relative interior of the domain of f, there exist 
x^,,y^ > such that the function 

m,C) := e/(^'^)xfyf 

attains its maximum at R — R^, , C — . 



Proof. Recall that / is concave; therefore, its graph has a supporting hyper- 
plane over (i?*,C^,). Let such a hyperplane have outward-pointing normal vector 

. . . ,Um,Vi, . . . ,Vn, 1). Set 

X, = {xi,...,Xm) = (e-"%...,e-"™) and y* = {yi,...,yn) = (e'^S . . . , e"^"). 
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Then 

m n 

(P{R, C) := f{R, C) + 5^ ran + 5^ c, In yj 

i=i j=i 

is concave with respect to R and C, and attains a critical point (hence its global 
maximum at {R,,C,). Therefore, so does $(i?,C) = e<^(«'^). □ 

2.4. Proof of Theorem [21 Assume the hypotheses of Theorem O Using Lemma Hj 
choose x^,,y^, so that e^'^^'^^'x.^y^ is the largest term in the expansion of G(x, y), 
evaluated at x = x^, and y = y*. Thus 

— ^ p*' ^*'^ < f# of terms of G with nonzero coeffs.l ■ e^^^'^\ 

The number of terms of G is at most 

i=i \ j=i J j=i \ i=i J 

since Tk{R, C) > implies that R and C do not exceed the margins of K. 
Let the symbol ^ denote the quantity 

m n ^ \ 

- In ^/2n\K\ + In a/StttI +5^1^ ^^2^lCj + (m + n) In ( — ^ ) , 
i=i i=i VV27r; 

last seen in Lemma O 

We deduce the following chain of inequalities: 

Inf inf > lnTK{R,G) 

> f{R,G)-^ 

> Inf int 4^)-^ 

\ Xi,s/j>o x^y*- ■ A/ / 

> Inf inf ^i^) - InAT-^. (7) 

Now we consider the cloning of the margins. Let G^^^ denote the generating function 
for AT'-'^^-bounded contingency tables. Letting 

xW :=(x^x^...,x^) 

/I 12 2 s s\ 

— I ry T" If n* If* nr I 
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and defining y'^''^ similarly, we note that 



From this it follows that 



for all s > 1. 

Inspection of the formulas for In A/" and 'v' shows that both of these terms from ([7]) 
have growth of order O(slns) as s — > oo. Therefore, by ([7]), 

l|nr,,.,(fl">,ci')) = iJ M ^)+oC-!^), 

\xi,yj>0 yi^Y^ J \ S J 

from which Theorem [2] follows. ■ 



3. Entropy-based estimates for Tk{R,C) 

In section [0| we introduced the functions H^^^{x) and k), k); refer to 
equations ([T]), ([2]) for an implicit description of the latter. We now list a few useful 
facts about i/^^^: 

Lemma 5. Let p = p{x] k), q = q{x] k). 
(i) i/™^^ is strictly concave on its domain, 
ill) H^''''{x) = -[Inp + xlnq]. 
(iii) For < x < k, ■£H^''''{x) = - \nq. 

Proof. First we prove claim (i). Let x,y G [0, k] and a, (3 > such that a + /3 = 1. 
We wish to prove that 

Let X and Y be independent random variables with distributions TG{x; k) and 
TG{y; k), respectively. Define a random variable Z whose distribution is a mixture 
of X and Y with weights a and (3; that is, 

Pr[Z = t] = ap{x; K)q{x; kY + (3p{y; K)q{y; for t = 0, 1, . . . , 

Then 

E[Z] =ax + /3y 

and 

H[Z] > aH[X]+(3H[Y] 
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(where H denotes the entropy, which is well-known to be strictly concave with 
respect to mixture). But 

Hr^'iax + Py) > H[Z], 

since H^^^{ax + fiy) is the maximum entropy achieved by any random variable 
supported on {0, 1, ... , k} with expectation ax + Py. This concludes the proof of 

(i)- 

Claim (ii) follows readily from equations ([T]), and the properties of logarithms. 
Let p = p{x; k), q = q{x; k). Then by definition of entropy, we have 

= -\p\\ip + pqlnipq) +pq^\\i{pq^) H V pq'^ \n.{pq'^)] 

= —[p\np + pq(\np + In g) + pq'^(\np + 2 In g) + ■ ■ ■ + pq'^{\np + /tin q)] 

= -[{p + pq + pq"^ -\ +pq'^){\np) + {pq + 2pq'^ -\ + Kpq'^){\nq)] 

= — [Inp + xlng]. 

Differentiating this formula with respect to x, and again using equations ([T]) and 
(El), we obtain 



.™xy(^) = - x- ^ - Ing 
p q 



p - {-] -p{q + 2q' + ■■■ + Kq'^) ■ - - In g 
\pj q 

-pq'{l + 2q^ \- Kq''^'^) -\nq 



= — In g. 
This proves claim (iii). □ 



Like the "(k + l)-nomial coefficients" of section 11.41 the functions H^^^^, p, and q 
admit simple explicit formulas only when k = 1 or «; = oo. To wit: 



X 

= -xlna; - (1 - x) ln(l - x) p{x;l) = l-x q{x;l) 



1 — X 

1 X 
H'^^{x) = (x + 1) ln(a; + 1) — x In x p(x; oo) = g(x; oo) 



X + 1 X + 1 



In fact, there is a close relationship between these functions and the (/c + l)-nomial 
coefficients: 
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Lemma 6. Let k G Z>o U {00} . Let n,r be integers (n > 0, < r < nn). Then 



lim - In 



sn 
sr 



n 



Proof. Let Xi,X2,... be independent random variables, each with distribution 
TG{1-k). LetX = (Xi,...,X,„). 

Observe that if x, x' G {0, 1, ... , kY , then 

Pr[X = x] ^ 

(where |x| := X]i=i ^i)- particular, all values of X with equal sum of coordinates 
are equiprobable. Let x^, denote an arbitrary value for X satisfying |x*| = sr. 

Recall that the Shannon self-information of a value X = x is defined as 

/(x) := -lnPr[X = x]; 
the entropy of X is the expected self-information of its value. Thus we have 



snm 



n 



H[X] 



E[J(X)] 

/(x,) - (Inq) E[\X\-sr] 



-In 



sn 
sr 



■Pr \X 



sr\ 



^^j - \nPr[\X\ = sr]. 



(8) 



Note that the probability mass function for each Xi is log-concave on Z. We apply 
a local limit theorem of Bender (see Appendix, Theorem 7) using 



Cp = Xi^ \-Xp, = p ■ Var(Xi), ^p = p.-^ 



and 



a; = 0, 



n 



with the normality hypothesis secured via Lyapunov's central limit theorem (Ap- 
pendix, Theorem 6), to infer 



and thence 



Pr IX 



lim CTpPr 



sr\ ~ 



n. 



'2tx' 



Substituting into ([8]), we conclude that 

■ r 



snHl 



n 



(27rsnVar(Xi))-i/2 ^ 9(5-^/^) 
-e(lns); 



In 



sn 
sr 



in particular, this proves the lemma. □ 
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3.1. A dual to the optimization problem in Theorem [2]. In Theorem [21 we 
"computed" InTj^(/2, C) asymptotically as 



We now show: 



Lemma 7. Suppose kij = k for 1 < i < m, 1 < j < n. Then the quantity given in 
(E]) is equal to 

m n 

1=1 j=i 

Proof. By Lemma [5|, H^^^{x) is strictly concave. Also, {H^^^y{x) = lnq{x;K,) 
approaches oo as x — > and — oo as x — )■ therefore, the maximum in ffTOl) is 
well-defined and is attained in the relative interior of IIk{R, C). For the remainder 
of this proof, let Z denote the (unique) location at which the maximum is attained, 
and let Pij := p{zij; k), qij := q{zij] k). 

Since Z is in the interior of Tli^{R, C), the local defining equations for nK(-R, C) at 
Z are just 

n m 

aij = Tj (1 < z < m) and aij = Cj {I < j < n). 

j=i i=i 

Introducing Lagrange multipliers for these constraints, we infer that In qij = Aj + 
for some constants Ai, . . . , Am, /^i, • • • , /^n- Define := rjj = e^^ ; thus qij = ^ifjj. 
Dividing equation ([2]) by equation ([1]) (see section [L3|) . we obtain 

l + ^,^. + (^,^^.)2 + ... + (^^^^.)«- 

For real- valued t = (ti, . . . , tm) and s = (si, . . . , s„), let 
^(t,s) := ln.^("'^) 



^RyC 



n 



i=l j=l i=l j=l 

This function is strictly convex, and has a critical point (hence a global minimum) 
at (t, s) if and only if the gradient is zero, that is, if 

Z-^ 1 _|_ qU+sj _|_ g2(ti+Sj) ^ . . . _|_ gK(ti+Sj) ' 1 < Z < ?Ti 

"1:^ gti+sj _|_ 2g2(ti+sj) _|_ |_ ^gK{ti+sj) 

and c,- = > — r7— — r 7— — r, 1 < 7 < n. 

.^L^ ]_ _|_ gU+Sj _j_ g2(ti+Sj) _|_ . . . _|_ gK(ti+Sj) _ _ 
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These conditions are satisfied at t = (Ai, . . . , A^) and s = (/ii, . . . , The mini- 
mum value of tf) is thus 

m n m n 

s) = - J] nX, - ^ Cj^j + ^ ^ In (1 + + {^^Vjf + ■■■ + {C^V,r) 

i=l j=l i=l j=l 

m n 

= [-^^ji^^ + ^j-) + 1^(1 + + 4- + ■ ■ ■ + 

i=i j=i 



EE 

i=i j=i 



-Zij In Qij + In 



i=i j=i 

This proves the lemma. □ 



3.2. Proof of Theorem[3l Part (i) follows directly from Theorem[2]and Lemma[71 
For part (ii), recall that 



1 rn 

mn \ -i-r / n 



(Definition [7]). Thus 



4(i?W,C( 



s mn 



-1 I m 



.i=l 



nc") ) (n 



sm 



SCi 



Applying Lemma O we obtain 
ln4(i?(^),C(^)) = 



i=l 



0[S 



(AT \ ^ n 

mn I ^-^ \n) ^-^ 

^ i=\ j=i 



n 



+ o(s) 



max / '^j 



m 



+ o{l] 



This completes the proof of Theorem [31 



3.3. The entropy loss function. The following function plays a key role in the 
proof of Theorem HI 
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Definition 10. Fix k G Z>o U {oo}. Given nonnegative ai,a2, such that 

n 

j{r) = j.Ar) ■■= ^^r^O-E^r^('^«^) 

i=i 

for all r > such that rai, ra2, ■ ■ ■ , ran < i^- 

In the spirit of the remarks at the beginning of section II. 3^ we propose the following 
(informal) interpretation for J(r). (The reader who is interested only in formal proof 
may skip to section IX^ ) 

Suppose A is an m X n contingency table, about which we know only that the sum 
of entries is N. In order to guess what A looks like, we might sample mn entries 
independently from TG k). The resulting matrix might not have sum of entries 
exactly equal to A^, but at least it is correct in expectation, and all actual tables 
with sum of entries are equally likely to be chosen. The entropy of our random 
model is mnH^^^ {—)■ 

If we subsequently learn that A has column sums C = (ci, . . . , c^), then we might 
revise our model by drawing entries in the j^^ column from TG k). The en- 
tropy of the random model is then 'm'^^^^ H^^^ (^^) . The information gained 
may be measured by the entropy lost, which is equal to mJ (^) for aj = ^ 
(j = l,...,n). 

The same comparison may be made in the presence of known row sums R = 
(ri, . . . , Tm). Before learning C, we sample entry {i,j) from TG k); after learning 
C, we sample entry (z, j) from TG ■ Of course, this guess is entirely naive — it 

essentially regards the rank 1 table as typical, which is a doubtful assumption. We 
do not propose that this is the best guess, but it does have the advantage of being 
computable. The entropy lost to this model when incorporating C in the presence 
of known R is 

m 
i=l 

where aj = ^ {j = 1, . . . ,n). Note that this quantity is defined if and only if the 
rank 1 table has all entries < k. 

We would like to know if the row margin R and the column margin G are positively 
correlated. Intuitively, this means that the revelation of G produces less surprise 
(entropy loss) when R is known in advance than when R is not known. In our 
random model, this is expressed by the inequality 

mj(-] > J{r^) + J{r2) + ■■■ + J{rJ. (11) 
\m J 

Now let us see if we can make a formal argument out of the intuition. 
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Theorem 131 part (i), implies that 

^ m n 

iimli„r4fi"),c<->) > 

i=l j=l 

again, the rank 1 matrix is chosen here purely out of convenience (we could have 
substituted any Z G IIk{,R) C)). Together with part (ii), this means that in order to 
prove 

lim i-lnr4i?«,C(^)) > lim ^ In C^^)), 
it suffices to show that 



m 



^ ^ i=l j=l j=l i=l 

or, equivalently (by rearranging terms). 



\ n m m n 

— -'"E'^""© £ "E'^°"(^)-EE*r-(i^ 

^ j=l j=l i=l j=l 



This is precisely what is asserted by the inequality ffTT]) . In order to prove Theorem HJ 
we will show that under its hypotheses, (fTTl) is true (and meaningful, i.e. A™'^ has 
entries < k). 

3.4. Proof of Theorem[4l. Assume the hypotheses of TheoremlH note in particular 
that /€ > 2. Let 

:= ^ {l<J<n). 



Consider the function 

0(x) := x^(i/r^)"(x) = -X 

q{x; k) 

(all derivatives being with respect to x; the second equality follows from Lemma[5][iii)). 

The above formula defines 0(x) only for < a: < /c, but we claim that 4>{x) can be 
extended analytically to a neighborhood of x = 0. 

Proof of claim: Equations ([1]) and ([2]) (section II. 3p yield 

q + 2q^ ^ h tiq'^ 

1 + q + q^ ^ V q'^' 

where q = q{x; k). Although this formula has only been assigned meaning for q > 0, 
it shows that function of q) can be extended analytically to a neighborhood 

of g = 0; the Maclaurin series is x = q + q"^ + 0{q^). Since ^ 7^ at g = 0, it follows 
that the inverse function q{x\ k) is also defined and analytic in a neighborhood of 
a; = 0, with Maclaurin series q = x — x"^ + 0{x^). Applying THopital's rule, we see 
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that the singularity of at x = is removable, so (p{x) is locally analytic there. 
□ 

We compute the Maclaurin series of (f>{x): 

(/)(x) = -X-- ^, „, = -X + X'' + 0{x-^). 

1 — X + C'(x^) 

Since the coefficient of x^ is positive, (/)(x) is strictly convex in a neighborhood 
of X = 0. Choose 6 G (0, 1) such that 0(x) is strictly convex in the interval 
|x| < 6k. 

Because S < 1, J{r) (see definition [TOj) is defined and differentiable at r = ri, . . . , r^- 
Differentiating, we have 

n 

and 

By the convexity of 0(x), we have J"(r) < for < r < ^^^^J^"^ — the inequality 
is strict if Q!„ are not all equal. Therefore, J(r) is concave on (the closure of) 

that interval, and strictly concave if ai, . . . , are not all equal. By our assumption 
that TiCj < 6kN, it follows that ri, . . . , are in that interval. 

Thus, inequality ( ITT]) holds: 




> J{n) + J{r2) + --- + J{rm), 



with strict inequality if are not all equal and ri, . . . , are also not all 

equal. 

When the function J is evaluated throughout this inequality, we obtain 

n m m n 

-"E"""(|) a «E"""©-EE^"'"(^)- 

j=l i=l 1=1 j=\ 



- 
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As explained at the end of section I3.3[ this imphes the last link, and Theorem [3] 
implies the first, in this chain of inequalities: 

^ m n 

\im —\nTM"\C^"^) > max V V 

1=1 j=i 

m n 

i=i j=i 

> lim ^ln4(i?(^),C7(^)). 

If a„ are not all equal and ri, . . . , are not all equal, then the last inequality 

in this chain is strict. This completes the proof of Theorem HI ■ 

3.5. Prospects for negative correlation of margins. Recall that for k = 1, all 
pairs of margins {R, C) have either zero or negative asymptotic correlation (specif- 
ically, negative unless either i? or C is a constant vector). For k = oo, the sign 
of correlation is reversed. We expect that these are the only "pure" cases: that 
is, when 1 < k < oo, there are some positively correlated pairs of margins as well 
as some negatively correlated pairs. Theorem H] asserts half of this conjecture: for 
K >2, any sufficiently sparse margins are asymptotically positively correlated. (By 
symmetry, "co-sparse margins" — those which force most entries to be close to k — are 
also positively correlated.) 

Numerical evidence and heuristic arguments suggest that, for all k < oo, mar- 
gins which are neither sparse nor co-sparse — or more specifically, close to i? = 
(^, . . . , ^) and C = (^, . . . , ^) — are negatively correlated. For example, we 
have used Theorem [3] to compute 

lim \ [lnT,(i?(^),C(^)) -ln4(i?W,C(^))] (12) 

s— >-oo S 

for margins of the form R = C = (7, 7 + e, 7 + 25, . . . , 7 + (n — l)e), n = 2,3, 4, 5. 
In these tests, when k = 2 and e = .02, expression (fT2l) turns out to be negative 
(indicating negative correlation of the margins) roughly when .09n < 7 < 1.89n, 
suggesting a threshold of 5 ~ .05. When k = 10 and e = .1, expression f|T2l) is 
negative roughly when 1.5r;, < 7 < 8.5n, suggesting 6 ~ .15. As /« — )■ cxd, the sharp 
value of 6 in Theorem H] appears to grow, but a threshold remains. 

An intuitive gloss on this phenomenon is that the distribution TG{x; k) "looks like" 
a geometric distribution when x ^ {01 x k), but looks more like a Bernoulli 
distribution when x is at neither extreme. In the former case, the "lid" k (or the fioor 
0) is remote from typical values, so the behavior observed when k, = 00 dominates. 
In the latter case, the k, = 1 behavior seems to dominate. 

The fundamental difference between these cases is suggested by the function 0(x) 
in the proof of Theorem SI When n = 00, this function is convex throughout its 
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domain; when k = 1, it is concave; and when 1 < k < oo, this function is convex 
near the origin, but has an inflection point. 

We can show that (l){x) is concave for x ~ |, so what are the obstacles to a reverse 
Theorem HP There are two. In the proof of Theorem HJ we relied on the fact 
that 

m n m n 

,,Sltlc, 5: E a EE """(f). (13) 

1=1 j=i 1=1 j=i 

allowing us to use the rank 1 matrix A™*^ = {^-^) as a proxy for the unknown Z 
which achieves the maximum. This matrix does not necessarily have entries < k; 
we were able to assume that it does only because our assumption of sparse margins 
did double duty. This is the first obstacle to a reverse Theorem HI the second is 
that, even if we could find another plausible candidate for Z, we could not make 
an assumption like ( JT3l) with the inequality reversed. Thus, to prove a negative 
correlation result, we believe it is crucial to understand something about where the 
maximum on the left-hand side of f|T3|) is achieved. 

4. Appendix 

This section contains some matter which was deferred from earlier sections. 

4.1. Proof of Lemma [H Claim (i) is trivial, since we can set p = 1, ai = 1, 
= R^C^ = C in Definition El 

For claim (ii), it suffices to show that if a + /3 = 1, then 

af{R\ C^) + f3f{R^, C^) < f{aR^ + f3R^, aC^ + I3C^). (14) 

By Definition El there exist 7i,...,7p > 0; R^\...,R^P; and C^\...,C^p such 
that 

t=i t=i t=i 

p 

and f{R\C') = J2ltlnTK{R'\C''). 

t=i 

Likewise, there exist 5i, ■ ■ ■ ,6g > 0; . . . , i?^^; and . . . , C^^ such that 



t=i t=i t=i 

1 



and f{R\C^) = Y,^t\^TK{R^\C^'). 

t=i 
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Note that 

p q p 



t=i t=i 

p q 

and ^ a7tC" + ^ ^^tC^* = aC^ + /JC^; 



t=l t=l t=l i=l 

p q 



t=l t=l 

applying Definition [8] to f{aR^ + [3R?,aC^ + /?C^), we obtain equation f|T4|) and 
thus claim (ii). 

It is clear that / is defined only on the convex hull of all (i?, C) for which Tk{R, C) > 
0; this region is a subset of IIk{R, C), proving claim (iii). □ 

4.2. Limit theorems of Lyapunov and Bender. We make use of the following 
theorems in the proof of Lemma [6l 

Theorem 6 (Lyapunov's central limit theorem). Suppose that (X„ : n E N) is a 
sequence of independent random variables, such that /i„ := E[X„] and af^ := Var[X„] 
are finite. Let Cn = Xi + ■ ■ ■ + Xn, and define m„ := E[Cn] = /ii + ■ ■ ■ + fin, 
4:=Var[C„] = a? + --- + a2. // 



1 " 



" fc=l 



/or some J > 0, then 

1 /"^ 2 

lim Pr < s„x + m„] = / e~* ^^(it 

V27r y_oo 

/or a// X G M. 

Theorem 7 (Bender local limit theorem). Suppose that (C„ : n G N) is a sequence 
of integer-valued random variables and ((T„) and [jir^ are sequences of real numbers, 
such that 

1 /"^ 

lim Pr [Cn < + /i„] = ^= / e"*'/^(it 

V2vr J_oo 

/or a// X G M. il/so suppose that (T„ — ?■ oo as n — )■ oo. Further, suppose that, for 
every n, the sequence bn{t) := Pr(C„ = t) is properly log-concave with respect to t. 
Then 

1 2/ 

lim cr„Pr [C„ = + /i„J] = — 

n-j-oo VStT 

uniformly for all x eM. 



The Lyapunov theorem is well-known; for a proof, see e.g. [TTj. The Bender theorem 
originally appeared in [7]; see also pO], which states and proves the theorem in a 
form more similar to the above. 
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